Aim: run univariate and bivariate analyses using eQTL data for select loci



1 Background

Bivariate local \(r_{g}\) analyses revealed several loci (representing LD blocks) where significant \(r_{g}\)'s were found between multiple trait pairs. Of note, some LD blocks contained local \(r_{g}\)'s between neurodegenerative traits and between neuropscyhiatric traits, but with no overlap between the disorder groups. In addition, we observed LD blocks where several traits were associated with one trait in particular, but directions of effect were opposing (e.g. locus 1719, where BIP and MDD were positively correlated with SCZ, while LBD was negatively correlated with SCZ; or locus 2351, where AD and PD were positively and negatively correlated with LBD, respectively). Assuming that these assocations also correlate with expression quantitative trait loci (eQTLs), such differences may be driven by associations to different gene eQTLs.

2 Supplementary code

Following section includes any intermediary code used in this .Rmd.

2.1 Get gene loci

source(here::here("scripts", "04a_get_gene_loci.R"))

All gene loci are shown below.

gene_loci <- 
  read_delim(
    file = here::here("results", "04_eqtl_univar_bivar", "window_100000", "gene_filtered.loci"),
    delim = "\t"
    )

gene_loci %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

2.2 Pre-processing eQTL data

source(here::here("scripts", "04b_preprocess_eqtl.R"))

2.3 Get sample overlap

source(here::here("scripts", "04c_get_sample_overlaps.R"))

Subset of sample overlap matrix shown below.

sample_overlap <-
  read.table(
    file = here::here("results", "04_eqtl_univar_bivar", "sample_overlap.txt")
    )

as.matrix(sample_overlap)[1:10,1:10]
##                          AD2019  BIP2021 LBD2020 MDD2019 PD2019.meta5.ex23andMe
## AD2019                  1.00000  0.00628 0.02589 0.01946                0.01193
## BIP2021                 0.00628  1.00000 0.00800 0.05997               -0.00150
## LBD2020                 0.02589  0.00800 1.00000 0.00040                0.01281
## MDD2019                 0.01946  0.05997 0.00040 1.00000                0.00436
## PD2019.meta5.ex23andMe  0.01193 -0.00150 0.01281 0.00436                1.00000
## SCZ2018                 0.01150  0.12928 0.00661 0.03615                0.00295
## EQTLGEN_ENSG00000007047 0.00000  0.00000 0.00000 0.00000                0.00000
## EQTLGEN_ENSG00000007255 0.00000  0.00000 0.00000 0.00000                0.00000
## EQTLGEN_ENSG00000041353 0.00000  0.00000 0.00000 0.00000                0.00000
## EQTLGEN_ENSG00000048028 0.00000  0.00000 0.00000 0.00000                0.00000
##                         SCZ2018 EQTLGEN_ENSG00000007047 EQTLGEN_ENSG00000007255
## AD2019                  0.01150                       0                       0
## BIP2021                 0.12928                       0                       0
## LBD2020                 0.00661                       0                       0
## MDD2019                 0.03615                       0                       0
## PD2019.meta5.ex23andMe  0.00295                       0                       0
## SCZ2018                 1.00000                       0                       0
## EQTLGEN_ENSG00000007047 0.00000                       1                       0
## EQTLGEN_ENSG00000007255 0.00000                       0                       1
## EQTLGEN_ENSG00000041353 0.00000                       0                       0
## EQTLGEN_ENSG00000048028 0.00000                       0                       0
##                         EQTLGEN_ENSG00000041353 EQTLGEN_ENSG00000048028
## AD2019                                        0                       0
## BIP2021                                       0                       0
## LBD2020                                       0                       0
## MDD2019                                       0                       0
## PD2019.meta5.ex23andMe                        0                       0
## SCZ2018                                       0                       0
## EQTLGEN_ENSG00000007047                       0                       0
## EQTLGEN_ENSG00000007255                       0                       0
## EQTLGEN_ENSG00000041353                       1                       0
## EQTLGEN_ENSG00000048028                       0                       1

2.4 Prepare input info file

# Most input info already available, just need to add eqtls
input_info  <- 
  read_delim(
    file = file.path(here::here("results", "01_input_prep"),
                     "input.info.txt"),
    delim = "\t"
  )

file_paths <- 
  list.files(
      path = here::here("results", "04_eqtl_univar_bivar", "qtl_files"),
      pattern = "lava.gz",
      full.names = T
    )

input_info <-
  tibble(
    phenotype = 
      basename(file_paths) %>% 
      str_remove(".lava.gz"),
    cases = rep(1, times = length(file_paths)),
    controls = rep(0, times = length(file_paths)),
    filename = file_paths
  ) %>% 
  dplyr::bind_rows(
    input_info
  )

write_delim(
  input_info,
  path = file.path(here::here("results", "04_eqtl_univar_bivar"),
                   "input.info.txt"),
  delim = "\t"
)

2.5 Run univariate and bivariate tests

This was run using nohup:

# Have to navigate to root project folder for script to work (as it uses here package)
cd /home/rreynolds/misc_projects/neurodegen-psych-local-corr

nohup Rscript \
/home/rreynolds/misc_projects/neurodegen-psych-local-corr/scripts/04d_run_univar_bivar_test.R \
&>/home/rreynolds/misc_projects/neurodegen-psych-local-corr/logs/04d_run_univar_bivar_test_multwindows.log&

2.6 Loading results

Once run, results can be loaded using the following code chunk:

results <- 
  setNames(
    vector(mode = "list", length = 2),
      nm = c("univ", "bivar")
  ) %>% 
  lapply(., function(x){
    
    setNames(
      vector(mode = "list", length = 2),
      nm = c("window_100000", "window_50000")
    )
    
  })

for(test in names(results)){
  
  for(window in names(results[[test]]))
    
    results[[test]][[window]] <- 
      setNames(
        object = 
          list.files(
            path = 
              here::here(
                "results",
                "04_eqtl_univar_bivar",
                window,
                test
              ),
            pattern = ".lava.rds", 
            full.names = T
          ) %>% 
          lapply(., function(file) readRDS(file)),
        nm = 
          list.files(
            path = 
              here::here(
                "results",
                "04_eqtl_univar_bivar",
                window,
                test
              ),
            pattern = ".lava.rds"
          ) %>% 
          str_remove(., "\\..*.lava.rds")
      ) %>% 
      purrr::discard(is.null) %>% 
      qdapTools::list_df2df(col1 = "eqtl_gene") %>% 
      dplyr::rename(
        gene_locus = locus
      ) %>% 
      tidyr::separate(
        col = eqtl_gene,
        into = c("eqtl_dataset", NA),
        sep = ":"
      ) %>% 
      dplyr::inner_join(
        gene_filtered_loci %>% 
          dplyr::select(
            ld_block = locus, gene_id, gene_name
          ),
        by = c("gene_locus" = "gene_id")
      ) %>% 
      dplyr::select(
        ld_block, eqtl_dataset, gene_locus, gene_name, everything()
      ) %>% 
      dplyr::arrange(ld_block, gene_locus, eqtl_dataset) %>% 
      as_tibble()
  
}

2.7 Loading plotting functions

source(here::here("R", "plots.R"))

3 Methods

3.1 Deriving loci for eQTL analyses

Firstly, we determined a number of LD blocks of interest from the LD blocks highlighted by bivariate local \(r_{g}\) analyses. LD blocks included:

  • Two LD blocks where AD and PD were correlated, but the correlation was either positive or negative.
    • Locus 681, containing SNCA, which is of interest to PD
    • Locus 1273, containing CLU, which is of interest to AD
  • An LD block where correlations were observed among neurodegenerative traits and among neuropsychiatric traits, but not between disorder groups, i.e. locus 2281, containing RAB27B
  • Two LD blocks where several traits were associated with one trait in particular, but directions of effect were opposing.
    • Locus 1719, containing DRD2
    • Locus 2351, containing APOE

From these LD blocks of interest, we defined gene loci. That is, a 100 kb window was added to the start and end of any gene that overlapped a locus of interest. These gene-defined loci (\(n\) = 92) were carried forward in downstream analyses. Further, to ensure that our results were not driven by window size, we re-ran all analyses using a 50 kb window.

3.2 Sample overlap

Due to the potential sample overlap between cohorts and its impact on any estimated correlations, it is advised to use cross-trait LDSC (PMID:25642630; PMID:26414676) to obtain an estimate of the sample overlap. This, however, is not possible with eQTL summary statistics. Provided a reasonable certainty that there is no sample overlap between eQTL and GWAS summary statistics, it can be assumed to be zero (which precludes any correlations between eQTLGen and PsychENCODE, as both use GTEx samples, albeit from different tissues). This, however, needs double-checking in terms of eQTL and GWAS cohorts.

3.3 Running univariate and bivariate tests using eQTL summary statistics

For each gene locus, only those traits that were found to have significant local \(r_{g}\) in the associated LD block were used for univariate and bivariate analyses with eQTL summary statistics. As described in 02_run_univar_bivar_test.html, a univariate test was performed as a filtering step for bivariate local \(r_{g}\) analyses. Bivariate local correlations were only performed (i) if the eQTL within the gene locus exhibited a significant univariate local genetic signal and (ii) for pairs of traits which both exhibited a significant univariate local genetic signal. A cut-off of p < 0.0005434783 was used to determine univariate significance. This resulted in a total of **** bivariate tests spanning 55 distinct gene loci. Bivariate results were multiple test corrected using two strategies: (i) a more stringent Bonferroni correction (i.e. p < and (ii) a more lenient FDR correction.

4 Results

4.1 Tabular results

4.1.1 Univariate results

print("Univariate column descriptions:")
## [1] "Univariate column descriptions:"
tibble(
  column = colnames(results$univ$window_100000),
  description = 
    c(
      "LD block",
      "eQTL dataset used",
      "Gene locus as identified by ensembl ID",
      "Gene name",
      "Gene locus chromosome",
      "Gene locus start base pair",
      "Gene locus end base pair",
      "The number of SNPs within the locus",
      "The number of PCs retained within the locus",
      "Analysed phenotype",
      "Observed local heritability",
      "P-value from the univariate test (F-test for continuous, Chi-sq for binary)"
    )
) %>%
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')
print("Univariate results (p < 0.05/n_loci):")
## [1] "Univariate results (p < 0.05/n_loci):"
results$univ <- 
  results$univ %>% 
  lapply(., function(df){
    
    df %>% 
      dplyr::mutate(
        phen = 
          case_when(
            !str_detect(phen, "ENSG") ~ phen %>% 
              str_replace_all("[:digit:]", "") %>% 
              str_remove("\\..*"),
            str_detect(phen, "ENSG") ~ str_remove(phen, ".*_")
          )
      )
    
  })
  

results$univ %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  dplyr::filter(p < 0.05/nrow(gene_loci)) %>%
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

4.1.2 Bivariate results

  • Multiple test correction is complicated by the many overlapping gene loci and the fact that we're repeatedly testing the same disease phenotypes within one LD block (and simply substituting the eQTL-gene)
  • Thus, applied to multiple test correction strategies:
    • A stringent Bonferroni, correcting for all bivariate tests performed
    • A lenient FDR, correcting for all bivariate tests performed
results$bivar <-
  results$bivar %>% 
  lapply(., function(df){
    
    df %>% 
      dplyr::mutate(
        across(
          .cols = contains("phen"),
          ~ case_when(
            !str_detect(.x, "ENSG") ~ .x %>% 
              str_replace_all("[:digit:]", "") %>% 
              str_remove("\\..*"),
            str_detect(.x, "ENSG") ~ str_remove(.x, ".*_")
          )
        )
      ) %>% 
      dplyr::mutate(
        fdr = p.adjust(p, method = "fdr"),
        bonferroni = p.adjust(p, method = "bonferroni")
      )
    
  })

# Pivot results
results_long <- 
  results$bivar %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  tidyr::pivot_longer( 
    cols = -c("window_size", "ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
    names_to = "feature",
    values_to = "value"
  ) %>% 
  tidyr::pivot_wider(names_from = "window_size")

print("Bivariate column descriptions:")
## [1] "Bivariate column descriptions:"
tibble(
  column = colnames(results$bivar$window_100000),
  description = 
    c(
      "LD block",
      "eQTL dataset used",
      "Gene locus as identified by ensembl ID",
      "Gene name",
      "Gene locus chromosome",
      "Gene locus start base pair",
      "Gene locus end base pair",
      "The number of SNPs within the locus",
      "The number of PCs retained within the locus",
      "Phenotype 1",
      "Phenotype 2",
      "Standardised coefficient for the local genetic correlation",
      "Lower 95% confidence estimate for rho",
      "Upper 95% confidence estimate for rho",
      "Equivalent of taking the square of rho. Denotes the proportion of variance in genetic signal for phen1 explained by phen2 (and vice versa)",
      "Lower 95% confidence estimate for r2",
      "Upper 95% confidence estimate for r2",
      "Simulation p-values for the local genetic correlation",
      "FDR-corrected p-values",
      "Bonferroni-corrected p-values"
    )
) %>%
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: wrap')
print("Bivariate results (FDR < 0.05):")
## [1] "Bivariate results (FDR < 0.05):"
bivar_fdr <-
  results$bivar %>% 
  lapply(., function(df) df %>% dplyr::filter(fdr < 0.05))
  

bivar_fdr %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')
print("Bivariate results (Bonferroni p < 0.05):")
## [1] "Bivariate results (Bonferroni p < 0.05):"
bivar_bonf <-
  results$bivar %>% 
  lapply(., function(df) df %>% dplyr::filter(bonferroni < 0.05))

bivar_bonf %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

4.1.3 Overlapping bivariate results between windows

  • This includes all results that were FDR < 0.05 when using either a window size of 50 kb or 100 kb. Note that there may be some gene loci where a significant genetic correlation was observed using one window size, but not the other.
overlap <- 
  bivar_fdr$window_100000 %>%
  dplyr::select(
    -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
  ) %>% 
  dplyr::left_join(
    results$bivar$window_50000 %>% 
      dplyr::select(
        -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
      ),
    by = c("ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
    suffix = c("_100000", "_50000")
  ) %>% 
  dplyr::bind_rows(
    bivar_fdr$window_50000 %>%
      dplyr::select(
        -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
      ) %>% 
      dplyr::left_join(
        results$bivar$window_100000 %>% 
          dplyr::select(
            -chr, -start, -stop, -contains("n_"), -contains("lower"), -contains("upper")
          ),
        by = c("ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
        suffix = c("_50000", "_100000")
      )
  ) %>% 
  select(
    ld_block:phen2, contains("rho"), contains("r2"), contains("p"), contains("fdr"), contains("bonferroni")
  ) %>% 
  dplyr::distinct()

overlap %>% 
  dplyr::arrange(ld_block, gene_name) %>% 
  DT::datatable(rownames = FALSE,
                options = list(scrollX = TRUE),
                class = 'white-space: nowrap')

4.2 Comparison of bivariate local \(r_{g}\)'s across 50 kb and 100 kb windows

4.2.1 Text

  • Bivariate \(r^{2}\), \(\rho\) and -log10(p-value) estimates between phenotypes that could be tested using gene loci with a 50 kb or 100 kb window were significantly positively correlated across the two window sizes (Figure 4.1). Notably, \(r^{2}\) estimates (i.e. the proportion of variance in genetic signal for one phenotype explained by the other) tended to be higher when using the 50 kb window, as compared to the 100 kb window, as evidenced by the fitted line falling below the equivalent of \(y = x\).
  • Using the more stringent Bonferroni cut-off, we detect a total of 42 significant bivariate local \(r_{g}\)'s across 29 distinct gene loci, which replicate irrespective of the window size used to define gene loci (Figure 4.2).
    • This equates to 48% of the significant 88 bivariate local \(r_{g}\)'s across both window sizes.
    • Alternatively, this equates to 57% of the 74 significant bivariate local \(r_{g}\)'s using the 100 kb window and 75% of the 56 significant bivariate local \(r_{g}\)'s using the 50 kb window.
  • By comparison, using the more lenient FDR cut-off, we detect a total of 83 significant bivariate local \(r_{g}\)'s across 41 distinct gene loci, which replicate irrespective of the window size used to define gene loci (Figure 4.2).
    • This equates to 51% of the significant 162 bivariate local \(r_{g}\)'s across both window sizes.
    • Alternatively, this equates to 61% of the 135 significant bivariate local \(r_{g}\)'s using the 100 kb window and 75% of the 110 significant bivariate local \(r_{g}\)'s using the 50 kb window.
  • As seen with \(r^{2}\) estimates from all phenotype pairs that could be tested across both window sizes, bivariate local \(r_{g}\)'s that were significant (Bonferroni-corrected p < 0.05) across both window sizes tended to have a higher \(r^{2}\) estimate when using the 50 kb window, as compared to the 100 kb window (Figure 4.3).
  • Irrespective of stringency used, more significant bivariate local \(r_{g}\)'s were identified using the 100 kb window. Further, there were some significant bivariate local \(r_{g}\)'s that were only identified when using one of the window sizes (Figure 4.2).

4.2.2 Figures

results_long %>% 
  dplyr::filter(
    !feature %in%
      c("chr", "start", "stop", "fdr", "bonferroni", "n_pcs", "n_snps"),
    !str_detect(feature, "upper"),
    !str_detect(feature, "lower")
  ) %>% 
  dplyr::mutate(
    across(
      .cols = contains("window"),
      ~ case_when(
        feature == "p" ~ -log10(.x),
        TRUE ~ .x
      )
    )
  ) %>%
  dplyr::mutate(
    feature = 
      case_when(
        feature == "p" ~ "-log10(p)",
        TRUE ~ feature
      ),
    phenotype_corr =
      case_when(
        str_detect(phen2, "ENSG") ~ "gwas-eqtl",
        TRUE ~ "gwas-gwas"
      )
  ) %>% 
  ggplot(
    aes(
      x = window_50000,
      y = window_100000
    )
  ) + 
  geom_point(
    aes(colour = phenotype_corr),
    alpha = 0.4
    ) + 
  geom_smooth(
    method = "lm", 
    formula = "y~x",
    level = 0.99, 
    colour = "black", 
    fill = "grey"
    ) +
  ggpubr::stat_cor(
    method = "pearson",
    cor.coef.name = "R"
  ) +
  geom_abline(
    intercept = 0,
    linetype = "dashed",
    colour = "#EE442F"
    ) +
  facet_wrap(
    vars(feature),
    scales = "free"
  ) +
  labs(
    x = "50 kb window",
    y = "100 kb window"
  ) +
  theme_rhr
Scatter plot of -log10(p-value), r2 and rho for each pair of phenotypes that could be tested across gene loci with a 50 kb or 100 kb window. Points are coloured by the type of phenotypes correlated e.g. GWAS-GWAS or GWAS-eQTL. The black line represents a linear model fitted to the data, with the 99% confidence interval indicated with a grey fill. The red dashed line represents the line y = x.

Figure 4.1: Scatter plot of -log10(p-value), r2 and rho for each pair of phenotypes that could be tested across gene loci with a 50 kb or 100 kb window. Points are coloured by the type of phenotypes correlated e.g. GWAS-GWAS or GWAS-eQTL. The black line represents a linear model fitted to the data, with the 99% confidence interval indicated with a grey fill. The red dashed line represents the line y = x.

results$bivar %>% 
  qdapTools::list_df2df(col1 = "window_size") %>% 
  tidyr::pivot_longer( 
    cols = -c("window_size", "ld_block", "eqtl_dataset", "gene_locus", "gene_name", "phen1", "phen2"),
    names_to = "feature",
    values_to = "value"
  ) %>% 
  dplyr::mutate(
    window_size =
      window_size %>% 
      readr::parse_number() %>% 
      format(scientific = FALSE) %>% 
      as.factor()
  ) %>% 
  dplyr::filter(
    feature %in% c("bonferroni", "fdr"),
    value < 0.05
  ) %>% 
  dplyr::count(window_size, feature, name = "n_total") %>% 
  dplyr::mutate(
    unique =
      case_when(
        feature == "bonferroni" ~ n_total - (overlap %>% dplyr::filter(bonferroni_100000 < 0.05 & bonferroni_50000 < 0.05) %>% nrow()),
        feature == "fdr" ~ n_total - (overlap %>% dplyr::filter(fdr_100000 < 0.05 & fdr_50000 < 0.05) %>% nrow())
      ),
    shared =
      n_total-unique
  ) %>% 
  tidyr::pivot_longer(
    cols = unique:shared,
    names_to = "sharing",
    values_to = "n"
  ) %>% 
  ggplot(
    aes(
      x = window_size,
      y = n,
      fill = sharing
    )
  ) + 
  geom_col() +
  facet_grid(cols = vars(feature)) +
  labs(
    x = "Window size (kb)",
    y = "Number of significant bivariate local rg"
  ) +
  theme_rhr
Number of significant bivariate local genetic correlations across window sizes, using either the stringent bonferroni cut-off or the more lenient FDR cut-off. Bars are coloured by whether local genetic correlations are significant across both window sizes (shared) or only one (unique).

Figure 4.2: Number of significant bivariate local genetic correlations across window sizes, using either the stringent bonferroni cut-off or the more lenient FDR cut-off. Bars are coloured by whether local genetic correlations are significant across both window sizes (shared) or only one (unique).

a <-
  results_long %>% 
  inner_join(
    overlap %>%
      dplyr::filter(bonferroni_100000 < 0.05 & bonferroni_50000 < 0.05) %>% 
      dplyr::select(ld_block, eqtl_dataset, gene_locus, gene_name, phen1, phen2)
  ) %>% 
  dplyr::filter(feature == "r2") %>% 
  dplyr::rename(
    `50` = window_50000,
    `100` = window_100000
  ) %>% 
  ggpubr::ggpaired(
    cond1 = "50",
    cond2 = "100",
    fill = "condition",
    line.color = "gray", 
    line.size = 0.4,
    ggtheme = theme_rhr
  ) + 
  labs(
    x = "Window size (kb)",
    y = "r2"
  ) +
  theme(legend.position = "none")

b <- 
  results_long %>% 
  inner_join(
    overlap %>%
      dplyr::filter(bonferroni_100000 < 0.05 & bonferroni_50000 < 0.05) %>% 
      dplyr::select(ld_block, eqtl_dataset, gene_locus, gene_name, phen1, phen2)
  ) %>%
  dplyr::filter(
    feature %in% c("r2"), 
    !is.na(window_50000), 
    !is.na(window_100000)
  ) %>% 
  dplyr::mutate(
    diff = window_100000 - window_50000
    ) %>% 
  ggplot(aes(x = diff)) + 
  geom_density() + 
  labs(
    x = "Difference in r2 between two window sizes\n(diff = 100 kb - 50 kb)",
    y = "Density"
  ) +
  theme_rhr

ggpubr::ggarrange(
  a,b,
  labels = letters[1:2]
)
(a) Boxplot displaying r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (Bonferroni-corrected p < 0.05) across both window sizes. Grey lines indicate paired estimates. (b) Density plot displaying difference in r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (Bonferroni-corrected p < 0.05) across both window sizes.

Figure 4.3: (a) Boxplot displaying r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (Bonferroni-corrected p < 0.05) across both window sizes. Grey lines indicate paired estimates. (b) Density plot displaying difference in r2 estimates between phenotype pairs that demonstrated significant bivariate local genetic correlation (Bonferroni-corrected p < 0.05) across both window sizes.

4.3 Significant bivariate local \(r_{g}\)'s at a global level (100 kb window)

4.3.1 Text

  • Using the more stringent Bonferroni cut-off, we detect a total of 74 significant bivariate local \(r_{g}\)'s across 41 distinct gene loci. When pairs of disease phenotypes (which were tested across multiple genes within an LD block) are collapsed within an LD block, we detect a total of 25 significant bivariate local \(r_{g}\)'s.
  • By comparison, when the more lenient FDR cut-off, we detect a total of 135 significant bivariate local \(r_{g}\)'s across 47 distinct gene loci. When pairs of disease phenotypes (which were tested across multiple genes within an LD block) are collapsed within an LD block, we detect a total of 51 significant bivariate local \(r_{g}\)'s.
  • Of the two eQTL datasets, the dataset with the highest number of gene loci (i.e. a gene where eQTLs were detected) was PsychENCODE (eQTLGen, n gene loci = 47; PsychENCODE, n gene loci = 87). In spite of this, the eQTL dataset with highest number of significant bivariate local \(r_{g}\)'s was eQTLGen, irrespective of the stringency of multiple test correction (Figure 4.4, 4.5). This likely reflects the bias in the selection of the LD blocks, which were weighted towards the AD/PD phenotypes, where we might expect to see a greater association with blood-derived molecular phenotypes.
  • Given that gene loci might vary in size, we plotted the number of SNPs within a locus and the p-value from the univariate test to determine whether there was a relationship between locus size and univariate p-value. There was no apparent relationship between the two (as determined by visual inspection, Figure 4.6).

4.3.2 Figures

data_to_plot <-
  bivar_bonf$window_100000 %>% 
  dplyr::filter(!str_detect(phen2, "ENSG")) %>% 
  dplyr::distinct(ld_block, phen1, phen2, rho) %>% 
  dplyr::group_by(ld_block, phen1, phen2) %>% 
  dplyr::summarise(rho = mean(rho)) %>% 
  dplyr::bind_rows(
    bivar_bonf$window_100000 %>% 
      dplyr::filter(str_detect(phen2, "ENSG")) %>% 
      dplyr::distinct(eqtl_dataset, ld_block, phen1, phen2, rho) %>% 
      dplyr::mutate(
        phen2 = 
          case_when(
            str_detect(phen2, "ENSG") ~ eqtl_dataset,
            TRUE ~ phen2
          )
      ) %>% 
      dplyr::select(ld_block, phen1, phen2, rho)
  ) %>% 
  dplyr::ungroup()

plot_bivar_chord_diagram(
  bivar_corr = data_to_plot,
  fct_phen = c(fct_disease, "EQTLGEN", "PSYCHENCODE"),
  palette = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
)
Chord diagram showing the number of **distinct** significant bivariate local genetic correlations between each of the traits across all LD blocks (Bonferroni-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

Figure 4.4: Chord diagram showing the number of distinct significant bivariate local genetic correlations between each of the traits across all LD blocks (Bonferroni-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

data_to_plot <-
  bivar_fdr$window_100000 %>% 
  dplyr::filter(!str_detect(phen2, "ENSG")) %>% 
  dplyr::distinct(ld_block, phen1, phen2, rho) %>% 
  dplyr::group_by(ld_block, phen1, phen2) %>% 
  dplyr::summarise(rho = mean(rho)) %>% 
  dplyr::bind_rows(
    bivar_fdr$window_100000 %>% 
      dplyr::filter(str_detect(phen2, "ENSG")) %>% 
      dplyr::distinct(eqtl_dataset, ld_block, phen1, phen2, rho) %>% 
      dplyr::mutate(
        phen2 = 
          case_when(
            str_detect(phen2, "ENSG") ~ eqtl_dataset,
            TRUE ~ phen2
          )
      ) %>% 
      dplyr::select(ld_block, phen1, phen2, rho)
  ) %>% 
  dplyr::ungroup()

plot_bivar_chord_diagram(
  bivar_corr = data_to_plot,
  fct_phen = c(fct_disease, "EQTLGEN", "PSYCHENCODE"),
  palette = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
)
Chord diagram showing the number of **distinct** significant bivariate local genetic correlations between each of the traits across all LD blocks (FDR-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

Figure 4.5: Chord diagram showing the number of distinct significant bivariate local genetic correlations between each of the traits across all LD blocks (FDR-corrected p < 0.05). Pairs of disease phenotypes (which were tested across multiple genes within an LD block) were collapsed and their average rho within the LD block used. Positive and negative correlations are coloured red and blue, respectively.

results$univ$window_100000 %>% 
  dplyr::mutate(
    phen =
      case_when(
        str_detect(phen, "ENSG") ~ eqtl_dataset,
        TRUE ~ phen
      )
  ) %>% 
  ggplot(
    aes(
      x = n_snps,
      y = -log10(p)
    )
  ) +
  geom_point(alpha = 0.5) +
  scale_x_log10() +
  scale_colour_manual(
    values = c(RColorBrewer::brewer.pal(n = 6, name = "BrBG"), "black", "grey")
  ) +
  facet_wrap(vars(phen)) +
  labs(
    x = "Number of SNPs within the locus (logarithmic scale)",
    y = "-log10(p-value)"
  ) +
  theme_rhr
Scatter plot of number of SNPs in a tested gene locus and the p-value from the univariate test for each trait.

Figure 4.6: Scatter plot of number of SNPs in a tested gene locus and the p-value from the univariate test for each trait.

4.4 Phenotype correlations at gene locus level (100 kb window) using lenient FDR multiple test correction

4.4.1 Text

Significance was determined using the more lenient FDR multiple test correction. Results that do not replicate using the more stringent Bonferroni multiple test correction should be interpreted with caution.

  • Locus 681
    • Across the entire LD block, a negative local \(r_{g}\) was observed between AD and PD.
    • When locus 681 was subsetted by gene locus, a significant negative \(r_{g}\) between AD and PD was only observed in the SNCA gene locus.
    • In addition, a significant negative \(r_{g}\) was observed between PD and eQTLGen-derived SNCA eQTLs (implying that increased PD risk is correlated with decreased SNCA expression).
      • While initially this may seem contrary to expectation, given that SNCA duplication/triplication events cause familial PD, it is important to bear in mind that this correlation is using blood-derived SNCA eQTLs. SNCA transcript counts in circulating blood cells have been observed to be reduced in early PD, an association that was seen in two PD case-control studies (Harvard Biomarker Study and the multicentre PROBE study) and a study of patients near disease onset (PPMI; PMID: 26220939). Importantly, this reduction in indviduals with PD remained significant after adjusting for covariates of gender, white and red blood cell counts.
      • Worth noting that the \(r^2\) (proportion of variance in genetic signal for one phenotype explained by the other) was \(r^2\) = 0.02 (as compared to the \(r^2\) between AD and PD at this locus, which was \(r^2\) = 0.3).
  • Locus 1273
    • Across the entire LD block, a positive local \(r_{g}\) was observed between AD and PD.
    • When locus 1273 was subsetted by gene locus, a significant positive local \(r_{g}\) was observed between AD and PD in 6 out of 15 gene loci (in the remaining gene loci, local \(r_{g}\)'s between AD and PD were non-significant). This included the SCARA5 gene locus, where significant negative local \(r_{g}\)'s were observed between AD and eQTLGen-derived SCARA5 eQTLs, as well as, PD and eQTLGen-derived SCARA5 eQTLs.
      • SCARA5 is a ferritin receptor that internalises ferritin (protein that stores iron), after which iron is liberated and transported to the cytosol. It is important for non-transferrin iron delivery (PMID: 19154717).
      • This is notable, as ferritin has found to be increased in AD and PD and, more generally, several studies have implicated cellular iron overload and iron-induced oxidative stress in AD, PD and other neurodegenerative diseases (as reviewed in PMID: 28154410).
    • A significant positive local \(r_{g}\) between AD and PD was also observed at the ELP3 gene locus, together with a positive local \(r_{g}\) between PD and eQTLGen-derived ELP3 eQTLs. ELP3 is a component of the RNA polymerase II elongator complex (specifically it is the catalytic tRNA acetyltransferase subunit), which is required for multiple tRNA modifications. It is thought to be involved in neurogenesis (PMID: 19185337). Emil Gustavsson mentioned that ELP3 was a PD candidate risk gene he had been working on, but was unable to link to PD risk.
    • Notably, while no local \(r_{g}\) was observed between AD and PD in the CLU locus, a significant positive local \(r_{g}\) was observed between AD and eQTLGen-derived CLU eQTLs and a negative local \(r_{g}\) was observed between PD and eQTLGen-derived CLU eQTLs. CLU has been associated with AD (PMID: 30872998), although the mechanism by which it might confer risk or protection is not known.
  • Locus 1719
    • Across the entire LD block, a positive local \(r_{g}\) was observed between BIP, MDD and SCZ and a negative local \(r_{g}\) between SCZ and LBD.
    • This relationship was almost entirely replicated in the NCAM1 locus, the exception being that no positive local \(r_{g}\) was observed between BIP and MDD. In this locus, however, a significant positive local \(r_{g}\) was observed between BIP and eQTLGen-derived NCAM1 eQTLs.
      • NCAM1 is a cell adhesion protein, which is a member of the immunoglobulin superfamily, and is involved in cell-to-cell interactions as well as cell-matrix interactions during development and differentiation. It is best known as a marker of neural lineages, but is also expressed in the hematopoietic system (PMID: 28791027).
      • Genetic variation in NCAM1 has previously been associated with increased risk of BIP in Japanese individuals (PMID: 15050861).
    • The positive local \(r_{g}\) observed between MDD and SCZ was also replicated in the ANKK1 locus. In addition, a significant negative local \(r_{g}\) was observed between both neuropsychiatric disorders and eQTLGen-derived ANKK1 eQTLs, an observation which was replicated for MDD and SCZ using ANKK1 PsychENCODE-derived eQTLs.
      • ANKK1 encodes a protein belonging to the Ser/Thr protein kinase family.
      • This gene is closely linked to DRD2 gene, and a well-studied restriction fragment length polymorphism designated TaqIA, was originally associated with the DRD2 gene, however, later was determined to be located in exon 8 of ANKK1 gene, where it causes a nonconservative amino acid substitution (PMID: 15146457, 18621654).
      • TaqIA polymorphism thought to be a marker of both DRD2 and ANKK1 genetic variant. Has been linked to: alcoholism, antisocial traits, schizophrenia, eating disorders, and some behavioral childhood disorders (PMID: 19526298).
    • Worth noting that DRD2 is not expressed in whole blood (GTEx v8, median TPM = 0, n = 755), which would explain why no eQTLs were found to regulate DRD2 in the eQTLGen dataset. In the PsychENCODE dataset, eQTLs were detected and had sufficient local heritability (\(h^2\) = 0.025, p = 0.0021), but no significant local \(r_{g}\)'s were detected.
  • Locus 2281
    • Across the entire LD block, a positive local \(r_{g}\) was observed between (i) BIP and SCZ, (ii) MDD and SCZ and (iii) AD and PD.
    • The positive local \(r_{g}\) observed between MDD and SCZ was replicated in the TCF4 gene locus. In addition, a significant negative local \(r_{g}\) was observed between (i) MDD and eQTLGen-derived TCF4 eQTLs and (ii) AD and eQTLGen-derived TCF4 eQTLs.
    • In the RAB27B gene locus, a positive local \(r_{g}\) was observed between (i) BIP and SCZ and (ii) BIP and MDD.
    • In addition, a negative local \(r_{g}\) was observed between MDD and eQTLGen-derived RAB27B eQTLs. Notably, when using PsychENCODE-derived RAB27B eQTLs, a positive local \(r_{g}\) was observed between MDD and RAB27B eQTLs. In other words, the direction of effect changed depending on the tissue.
  • Locus 2351
    • Across the entire LD block, a positive local \(r_{g}\) was observed between AD and LBD and a negative local \(r_{g}\) between PD and LBD. While the correlation between AD and PD did not pass the Bonferroni cut-off of p = 0.0001666667, it was nominally significant at p = 0.000119, with a rho = -0.18.
    • While each of these correlations was not replicated in every gene locus with a significant local \(r_{g}\) between a disease phenotype and eQTL, where they were, they shared the same direction of effect as the local \(r_{g}\) observed across the entire LD block.
    • A total of 16 gene loci contained a local \(r_{g}\) between a disease phenotype and eQTL.
    • No significant local \(r_{g}\) was observed between APOE eQTLs and any of the disease traits. In part this was because there was insufficient heritability in the region using PsychENCODE eQTLs (univariate p-value = 0.28). While there was sufficient heritability using eQTLGen eQTLs, bivariate p-values did not pass FDR-correction (AD-APOE, rho = 0.18, p = 0.039; LBD-APOE, rho = 0.09, p = 0.374; PD-APOE, rho = -0.22, p = 0.063). However, as seen at the level of LD block, AD and LBD were positively correlated, while (i) AD and PD and (ii) LBD and PD were negatively correlated.
    • At the TOMM40 gene locus (which is adjacent to the APOE gene locus), disease phenotypes were correlated as observed at the level of LD block. In addition significant positive local \(r_{g}\)'s were observed between eQTLGen-derived TOMM40 eQTLs and both AD and LBD (TOMM40 has been previously associated with AD). Similar local \(r_{g}\) patterns were seen at the PVRL2 gene locus, which is adjacent to the TOMM40 gene locus. Notably, haplotypes in the PVRL2 have been found to exert risk effects on AD in an APOE-E4 and APOE-E2 genotype-independent manner (PMID: 31346172)
    • Other notable eQTL-trait associations include:
      • A positive correlation between AD and GEMIN7 eQTLs (across both eQTLGen and PyschENCODE, albeit stronger in PsychENCODE). GEMIN7 is a component of the core SMN complex (PMID: 12065586), which is required for pre-mRNA splicing in the nucleus.

4.4.2 Figures

ld_blocks <- 
  bivar_fdr$window_100000$ld_block %>% unique() %>% sort()

qtl_genes <-
  bivar_fdr$window_100000 %>%
  dplyr::filter(str_detect(phen2, "ENSG")) %>% 
  .[["gene_locus"]] %>% 
  unique()

bivar_list <- 
  setNames(
    bivar_fdr$window_100000 %>% 
      dplyr::group_split(ld_block),
    nm = 
      str_c("locus_", ld_blocks)
  )
  
ref <- import("/data/references/ensembl/gtf_gff3/v87/Homo_sapiens.GRCh37.87.gtf")
ref <- ref %>% keepSeqlevels(c(1:22), pruning.mode = "coarse") 
ref <- ref[ref$type == "gene"]

loci_gr <-
  LAVA::read.loci(here::here("results", "01_input_prep", "gwas_filtered.loci")) %>%
  dplyr::rename(locus = LOC) %>% 
  dplyr::filter(locus %in% bivar_fdr$window_100000$ld_block) %>% 
  GenomicRanges::makeGRangesFromDataFrame(
    .,
    keep.extra.columns = TRUE,
    ignore.strand = TRUE,
    seqinfo = NULL,
    seqnames.field = "CHR",
    start.field = "START",
    end.field = "STOP"
  )

fig_list <- vector(mode = "list", length = length(ld_blocks))

for(block in ld_blocks){
  
  locus_plot <- 
    plot_locus(
    locus_gr = loci_gr[loci_gr$locus == block], 
    ref = ref,
    highlight_gene = qtl_genes,
    highlight_gene_label = "QTL - local rg?"
    )
  
  edge_plot <- 
    plot_qtl_edge_diagram(
          bivar_corr_qtl = bivar_list[[str_c("locus_", block)]],
          phen = fct_disease,
          seed = 89
        )
  
  if(length(edge_plot)/4 >=2){
    
    height <- c(1,3.5)
    
  } else{

    height <- c(1,1)
  }
  
  fig_list[[which(ld_blocks == block)]] <- 
    ggpubr::ggarrange(
      locus_plot,
      ggpubr::ggarrange(
        plotlist = edge_plot,
        ncol = 4,
        nrow = ceiling(length(edge_plot)/4), 
        common.legend = TRUE,
        legend = "top"
      ),
      ncol = 1,
      labels = letters[1:2],
      heights = height
    )
  
    names(fig_list)[which(ld_blocks == block)] <- str_c("locus_", block)
  
}

fig_list[1:4]
## $locus_681
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.7: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1273
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.8: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1719
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.9: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_2281
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.10: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

fig_list[5]
## $locus_2351
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.11: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (FDR-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (FDR-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

4.5 Phenotype correlations at gene locus level (100 kb window) using stringent bonferroni multiple test correction

4.5.1 Text

  • Of the 43 eQTL-GWAS bivariate correlations observed using the lenient FDR correction, 19 survived Bonferroni correction.
  • Some of these included:
    • Locus 681
      • Negative \(r_{g}\) between AD and PD and between PD and eQTLGen-derived SNCA eQTLs. This was also significant when using a window size of 50 kb.
    • Locus 1273
      • Positive local \(r_{g}\) between PD and eQTLGen-derived ESCO2, PBK and PNOC eQTLs. Using a window size of 50 kb, associations with PBK and PNOC eQTLs were significant, while the association with ESCO2 eQTLs did not survive Bonferroni correction (but was significant using FDR).
    • Locus 1719
      • Positive local \(r_{g}\) between MDD and SCZ and negative local \(r_{g}\) between both neuropsychiatric disorders and eQTLGen-derived ANKK1 eQTLs. Using a window size of 50 kb, the association of MDD with ANKK1 eQTLs was significant, while the association of SCZ with ANKK1 eQTLs was nominal (Bonferroni-corrected p-value = 0.073).
      • Positve local \(r_{g}\) between BIP and eQTLGen-derived NCAM1 eQTLs. This was also significant when using a window size of 50 kb.
    • Locus 2281
      • Positive local \(r_{g}\) between MDD and PsychENCODE-derived RAB27B eQTLs. This was also significant when using a window size of 50 kb.
      • Negative local \(r_{g}\) between MDD and eQTLGen-derived TCF4 eQTLs. Using a window size of 50 kb, this association did not survive Bonferroni correction, but was significant using the lenient FDR cut-off.
    • Locus 2351 (of the 22 eQTL-GWAS bivariate correlations observed using the lenient FDR correction, 10 survived Bonferroni correction)
      • Positive local \(r_{g}\) between AD and eQTLGen- and PsychENCODE-derived GEMIN7 eQTLs. Using a window size of 50 kb, neither of these associations survived Bonferroni correction; however, the association between AD and PsychENCODE-derived GEMIN7 eQTLs was significant using the lenient FDR cut-off.
      • Positive local \(r_{g}\) between AD and eQTLGen-derived PVRL2 eQTLs. This was also significant when using a window size of 50 kb.

4.5.2 Figures

ld_blocks <- 
  bivar_bonf$window_100000 %>%
  dplyr::group_by(eqtl_dataset, gene_locus) %>%
  .[["ld_block"]] %>% 
  unique() %>% 
  sort()

qtl_genes <-
  bivar_bonf$window_100000 %>%
  dplyr::group_by(eqtl_dataset, gene_locus) %>%
  dplyr::filter(str_detect(phen2, "ENSG")) %>% 
  .[["gene_locus"]] %>% 
  unique()

bivar_list <- 
  setNames(
    bivar_bonf$window_100000 %>% 
      dplyr::group_split(ld_block),
    nm = 
      str_c("locus_", ld_blocks)
  )
  
fig_list <- vector(mode = "list", length = length(ld_blocks))

for(block in ld_blocks){
  
  locus_plot <- 
    plot_locus(
    locus_gr = loci_gr[loci_gr$locus == block], 
    ref = ref,
    highlight_gene = qtl_genes,
    highlight_gene_label = "QTL - local rg?"
    )
  
  edge_plot <- 
    plot_qtl_edge_diagram(
          bivar_corr_qtl = bivar_list[[str_c("locus_", block)]],
          phen = fct_disease,
          seed = 89
        )
  
  if(length(edge_plot)/4 >=2){
    
    height <- c(1,3.5)
    
  } else{

    height <- c(1,0.5)
  }
  
  fig_list[[which(ld_blocks == block)]] <- 
    ggpubr::ggarrange(
      locus_plot,
      ggpubr::ggarrange(
        plotlist = edge_plot,
        ncol = 4,
        nrow = ceiling(length(edge_plot)/4), 
        common.legend = TRUE,
        legend = "top"
      ),
      ncol = 1,
      labels = letters[1:2],
      heights = height
    )
  
    names(fig_list)[which(ld_blocks == block)] <- str_c("locus_", block)
  
}

fig_list[1:4]
## $locus_681
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.12: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1273
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.13: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_1719
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.14: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

## 
## $locus_2281
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.15: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

fig_list[5]
## $locus_2351
(a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.

Figure 4.16: (a) Locus plot. Gene loci with a significant bivariate local genetic correlation between a disease phenotype and eQTL are highlighted in blue (Bonferroni-corrected p < 0.05). (b) Edge diagrams for gene loci where a significant bivariate local genetic correlation was observed between a disease phenotype and eQTL (Bonferroni-corrected p < 0.05). Edges display the standardised coefficient for genetic correlation (rho) for significant bivariate local genetic correlations, with negative and positive correlations indicated by blue and red colour, respectively. GWAS and eQTL nodes are indicated by grey and white fill, respectively.



5 Session info

## ─ Session info ───────────────────────────────────────────────────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 4.0.5 (2021-03-31)
##  os       Ubuntu 16.04.7 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language (EN)                        
##  collate  en_GB.UTF-8                 
##  ctype    en_GB.UTF-8                 
##  tz       Europe/London               
##  date     2022-02-01                  
## 
## ─ Packages ───────────────────────────────────────────────────────────────────────────────────────────────────────────
##  package              * version  date       lib source         
##  abind                  1.4-5    2016-07-21 [2] CRAN (R 4.0.5) 
##  assertthat             0.2.1    2019-03-21 [2] CRAN (R 4.0.5) 
##  backports              1.3.0    2021-10-27 [1] CRAN (R 4.0.5) 
##  Biobase                2.50.0   2020-10-27 [2] Bioconductor   
##  BiocGenerics         * 0.36.1   2021-04-16 [2] Bioconductor   
##  BiocParallel           1.24.1   2020-11-06 [2] Bioconductor   
##  Biostrings             2.58.0   2020-10-27 [2] Bioconductor   
##  bit                    4.0.4    2020-08-04 [2] CRAN (R 4.0.5) 
##  bit64                  4.0.5    2020-08-30 [2] CRAN (R 4.0.5) 
##  bitops                 1.0-7    2021-04-24 [2] CRAN (R 4.0.5) 
##  bookdown               0.22     2021-04-22 [2] CRAN (R 4.0.5) 
##  broom                  0.7.10   2021-10-31 [1] CRAN (R 4.0.5) 
##  bslib                  0.3.1    2021-10-06 [1] CRAN (R 4.0.5) 
##  car                    3.0-11   2021-06-27 [2] CRAN (R 4.0.5) 
##  carData                3.0-4    2020-05-22 [2] CRAN (R 4.0.5) 
##  cellranger             1.1.0    2016-07-27 [2] CRAN (R 4.0.5) 
##  chron                  2.3-56   2020-08-18 [1] CRAN (R 4.0.5) 
##  circlize             * 0.4.13   2021-06-09 [1] CRAN (R 4.0.5) 
##  cli                    3.1.0    2021-10-27 [1] CRAN (R 4.0.5) 
##  colorspace             2.0-2    2021-06-24 [2] CRAN (R 4.0.5) 
##  cowplot                1.1.1    2020-12-30 [2] CRAN (R 4.0.5) 
##  crayon                 1.4.2    2021-10-29 [1] CRAN (R 4.0.5) 
##  crosstalk              1.1.1    2021-01-12 [2] CRAN (R 4.0.5) 
##  curl                   4.3.2    2021-06-23 [2] CRAN (R 4.0.5) 
##  data.table             1.14.2   2021-09-27 [1] CRAN (R 4.0.5) 
##  DBI                    1.1.1    2021-01-15 [2] CRAN (R 4.0.5) 
##  dbplyr                 2.1.1    2021-04-06 [2] CRAN (R 4.0.5) 
##  DelayedArray           0.16.3   2021-03-24 [2] Bioconductor   
##  digest                 0.6.29   2021-12-01 [1] CRAN (R 4.0.5) 
##  dplyr                * 1.0.7    2021-06-18 [2] CRAN (R 4.0.5) 
##  DT                     0.19     2021-09-02 [1] CRAN (R 4.0.5) 
##  ellipsis               0.3.2    2021-04-29 [2] CRAN (R 4.0.5) 
##  evaluate               0.14     2019-05-28 [2] CRAN (R 4.0.5) 
##  fansi                  0.5.0    2021-05-25 [2] CRAN (R 4.0.5) 
##  farver                 2.1.0    2021-02-28 [2] CRAN (R 4.0.5) 
##  fastmap                1.1.0    2021-01-25 [2] CRAN (R 4.0.5) 
##  forcats              * 0.5.1    2021-01-27 [2] CRAN (R 4.0.5) 
##  foreign                0.8-81   2020-12-22 [2] CRAN (R 4.0.5) 
##  fs                     1.5.1    2021-11-30 [1] CRAN (R 4.0.5) 
##  generics               0.1.1    2021-10-25 [1] CRAN (R 4.0.5) 
##  GenomeInfoDb         * 1.26.7   2021-04-08 [2] Bioconductor   
##  GenomeInfoDbData       1.2.4    2021-07-03 [2] Bioconductor   
##  GenomicAlignments      1.26.0   2020-10-27 [2] Bioconductor   
##  GenomicRanges        * 1.42.0   2020-10-27 [2] Bioconductor   
##  ggforce                0.3.3    2021-03-05 [2] CRAN (R 4.0.5) 
##  ggplot2              * 3.3.5    2021-06-25 [2] CRAN (R 4.0.5) 
##  ggpubr                 0.4.0    2020-06-27 [2] CRAN (R 4.0.5) 
##  ggraph               * 2.0.5    2021-02-23 [1] CRAN (R 4.0.5) 
##  ggrepel                0.9.1    2021-01-15 [2] CRAN (R 4.0.5) 
##  ggsignif               0.6.3    2021-09-09 [1] CRAN (R 4.0.5) 
##  GlobalOptions          0.1.2    2020-06-10 [1] CRAN (R 4.0.5) 
##  glue                   1.5.1    2021-11-30 [1] CRAN (R 4.0.5) 
##  graphlayouts           0.7.1    2020-10-26 [1] CRAN (R 4.0.5) 
##  gridExtra              2.3      2017-09-09 [2] CRAN (R 4.0.5) 
##  gtable                 0.3.0    2019-03-25 [2] CRAN (R 4.0.5) 
##  haven                  2.4.3    2021-08-04 [1] CRAN (R 4.0.5) 
##  here                   1.0.1    2020-12-13 [1] CRAN (R 4.0.5) 
##  highr                  0.9      2021-04-16 [2] CRAN (R 4.0.5) 
##  hms                    1.1.1    2021-09-26 [1] CRAN (R 4.0.5) 
##  htmltools              0.5.2    2021-08-25 [1] CRAN (R 4.0.5) 
##  htmlwidgets            1.5.4    2021-09-08 [1] CRAN (R 4.0.5) 
##  httr                   1.4.2    2020-07-20 [2] CRAN (R 4.0.5) 
##  igraph                 1.2.7    2021-10-15 [1] CRAN (R 4.0.5) 
##  IRanges              * 2.24.1   2020-12-12 [2] Bioconductor   
##  jquerylib              0.1.4    2021-04-26 [2] CRAN (R 4.0.5) 
##  jsonlite               1.7.2    2020-12-09 [2] CRAN (R 4.0.5) 
##  knitr                  1.36     2021-09-29 [1] CRAN (R 4.0.5) 
##  labeling               0.4.2    2020-10-20 [2] CRAN (R 4.0.5) 
##  lattice                0.20-44  2021-05-02 [2] CRAN (R 4.0.5) 
##  LAVA                   0.0.6    2021-09-01 [1] xgit (@7be3421)
##  lifecycle              1.0.1    2021-09-24 [1] CRAN (R 4.0.5) 
##  lubridate              1.8.0    2021-10-07 [1] CRAN (R 4.0.5) 
##  magrittr               2.0.1    2020-11-17 [2] CRAN (R 4.0.5) 
##  MASS                   7.3-54   2021-05-03 [2] CRAN (R 4.0.5) 
##  Matrix                 1.3-4    2021-06-01 [2] CRAN (R 4.0.5) 
##  MatrixGenerics         1.2.1    2021-01-30 [2] Bioconductor   
##  matrixStats            0.61.0   2021-09-17 [1] CRAN (R 4.0.5) 
##  mgcv                   1.8-36   2021-06-01 [2] CRAN (R 4.0.5) 
##  modelr                 0.1.8    2020-05-19 [2] CRAN (R 4.0.5) 
##  munsell                0.5.0    2018-06-12 [2] CRAN (R 4.0.5) 
##  nlme                   3.1-152  2021-02-04 [2] CRAN (R 4.0.5) 
##  openxlsx               4.2.4    2021-06-16 [2] CRAN (R 4.0.5) 
##  pillar                 1.6.4    2021-10-18 [1] CRAN (R 4.0.5) 
##  pkgconfig              2.0.3    2019-09-22 [2] CRAN (R 4.0.5) 
##  polyclip               1.10-0   2019-03-14 [2] CRAN (R 4.0.5) 
##  purrr                * 0.3.4    2020-04-17 [2] CRAN (R 4.0.5) 
##  qdapTools              1.3.5    2020-04-17 [1] CRAN (R 4.0.5) 
##  R6                     2.5.1    2021-08-19 [1] CRAN (R 4.0.5) 
##  RColorBrewer           1.1-2    2014-12-07 [2] CRAN (R 4.0.5) 
##  Rcpp                   1.0.7    2021-07-07 [1] CRAN (R 4.0.5) 
##  RCurl                  1.98-1.5 2021-09-17 [1] CRAN (R 4.0.5) 
##  readr                * 2.0.2    2021-09-27 [1] CRAN (R 4.0.5) 
##  readxl                 1.3.1    2019-03-13 [2] CRAN (R 4.0.5) 
##  reprex                 2.0.0    2021-04-02 [2] CRAN (R 4.0.5) 
##  rio                    0.5.27   2021-06-21 [2] CRAN (R 4.0.5) 
##  rlang                  0.4.12   2021-10-18 [1] CRAN (R 4.0.5) 
##  rmarkdown              2.11     2021-09-14 [1] CRAN (R 4.0.5) 
##  rprojroot              2.0.2    2020-11-15 [2] CRAN (R 4.0.5) 
##  Rsamtools              2.6.0    2020-10-27 [2] Bioconductor   
##  rstatix                0.7.0    2021-02-13 [2] CRAN (R 4.0.5) 
##  rstudioapi             0.13     2020-11-12 [2] CRAN (R 4.0.5) 
##  rtracklayer          * 1.50.0   2020-10-27 [2] Bioconductor   
##  rvest                  1.0.1    2021-07-26 [2] CRAN (R 4.0.5) 
##  S4Vectors            * 0.28.1   2020-12-09 [2] Bioconductor   
##  sass                   0.4.0    2021-05-12 [2] CRAN (R 4.0.5) 
##  scales                 1.1.1    2020-05-11 [2] CRAN (R 4.0.5) 
##  sessioninfo          * 1.1.1    2018-11-05 [2] CRAN (R 4.0.5) 
##  shape                  1.4.6    2021-05-19 [1] CRAN (R 4.0.5) 
##  stringi                1.7.6    2021-11-29 [1] CRAN (R 4.0.5) 
##  stringr              * 1.4.0    2019-02-10 [2] CRAN (R 4.0.5) 
##  SummarizedExperiment   1.20.0   2020-10-27 [2] Bioconductor   
##  tibble               * 3.1.6    2021-11-07 [1] CRAN (R 4.0.5) 
##  tidygraph              1.2.0    2020-05-12 [1] CRAN (R 4.0.5) 
##  tidyr                * 1.1.4    2021-09-27 [1] CRAN (R 4.0.5) 
##  tidyselect             1.1.1    2021-04-30 [2] CRAN (R 4.0.5) 
##  tidyverse            * 1.3.1    2021-04-15 [1] CRAN (R 4.0.5) 
##  tweenr                 1.0.2    2021-03-23 [2] CRAN (R 4.0.5) 
##  tzdb                   0.2.0    2021-10-27 [1] CRAN (R 4.0.5) 
##  utf8                   1.2.2    2021-07-24 [2] CRAN (R 4.0.5) 
##  vctrs                  0.3.8    2021-04-29 [2] CRAN (R 4.0.5) 
##  viridis                0.6.2    2021-10-13 [1] CRAN (R 4.0.5) 
##  viridisLite            0.4.0    2021-04-13 [2] CRAN (R 4.0.5) 
##  vroom                  1.5.5    2021-09-14 [1] CRAN (R 4.0.5) 
##  withr                  2.4.3    2021-11-30 [1] CRAN (R 4.0.5) 
##  xfun                   0.27     2021-10-18 [1] CRAN (R 4.0.5) 
##  XML                    3.99-0.8 2021-09-17 [1] CRAN (R 4.0.5) 
##  xml2                   1.3.2    2020-04-23 [2] CRAN (R 4.0.5) 
##  XVector                0.30.0   2020-10-27 [2] Bioconductor   
##  yaml                   2.2.1    2020-02-01 [2] CRAN (R 4.0.5) 
##  zip                    2.2.0    2021-05-31 [2] CRAN (R 4.0.5) 
##  zlibbioc               1.36.0   2020-10-27 [2] Bioconductor   
## 
## [1] /home/rreynolds/R/x86_64-pc-linux-gnu-library/4.0
## [2] /opt/R/4.0.5/lib/R/library